Acoustic modeling for spontaneous speech recognition using syllable dependent models
نویسندگان
چکیده
This paper proposes a syllable context dependent model for spontaneous speech recognition. It is generally assumed that, since spontaneous speech is greatly affected by coarticulation, an acoustic model featuring a longer range phonemic context is required to achieve a high degree of recognition accuracy. This motivated the authors to investigate a tri-syllable model that takes differences in the preceding and succeeding syllables into account. Since Japanese syllables consist of either a single vowel or a consonant and vowel combination, a tri-syllable model always takes the preceding and succeeding vowels that are the primary factors in coarticulation into account. A tri-syllable model is thus capable of efficiently representing coarticulation. The tri-syllable model was trained using spontaneous speech; then, the effectiveness of continuous syllable recognition and statistical language model-based continuous word recognition were evaluated. Compared to a regular triphone model without state sharing, it was found that the correct syllable accuracy of the continuous syllable recognition improved from 64.9% to 66.3%. The word recognition accuracy for the statistical language modelbased continuous word recognition improved from 88.4% to 89.2%.
منابع مشابه
Allophone-based acoustic modeling for Persian phoneme recognition
Phoneme recognition is one of the fundamental phases of automatic speech recognition. Coarticulation which refers to the integration of sounds, is one of the important obstacles in phoneme recognition. In other words, each phone is influenced and changed by the characteristics of its neighbor phones, and coarticulation is responsible for most of these changes. The idea of modeling the effects o...
متن کاملSyllable-based acoustic modeling for Japanese spontaneous speech recognition
We study on a syllable-based acoustic modeling method for Japanese spontaneous speech recognition. Traditionally, mora-based acoustic models have been adopted for Japanese read speech recognition systems. In this paper, syllable-based unit and mora-based unit are clearly distinguished in their definition, and syllables are shown to be more suitable as an acoustic model for Japanese spontaneous ...
متن کاملSub-syllable Acoustic Modeling for Cantonese Speech Recognition
This paper presents a pioneer study on acoustic modeling for continuous Cantonese speech recognition. It starts from the context-independent modeling of sub-syllabic units, namely INITIALs and FINALs, and then moves on to examine a number of context-dependent models that characterize intra-syllable co-articulation. The acoustic models are trained with a large database of Cantonese polysyllabic ...
متن کاملImprovements in English Asr for the Malach Project Using Syllable-centric Models
LVCSR systems have traditionally used phones as the basic acoustic unit for recognition. Syllable and other longer length units provide an efficient means for modeling long-term temporal dependencies in speech that are difficult to capture in a phone based recognition framework. However, it is well known that longer duration units suffer from training data sparsity problems since a large number...
متن کاملA Syllable, Articulatory-feature, and Stress-accent Model of Speech Recognition
Current-generation automatic speech recognition (ASR) systems assume that words are readily decomposable into constituent phonetic components (\phonemes"). A detailed linguistic dissection of state-of-the-art speech recognition systems indicates that the conventional phonemic \beads-on-a-string" approach is of limited utility, particularly with respect to informal, conversational material. The ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000